With the rapid growth of mobile communication, SMS spam has become a significant challenge, leading to financial frauds, phishing attacks, and privacy breaches. A machine learning-based SMS spam detection system with sender blocking features is proposed in this paper.
To categorize communications as safe, suspicious, or spam, the system uses text preprocessing, TF-IDF feature extraction, and the Naïve Bayes classifier. To enhance security, the system issues warnings and blocks senders after repeated spam attempts. Results from experiments demonstrate how effective the strategy is in reducing spam and protecting users from malevolent senders.
Introduction
With the rise of mobile phone usage, SMS (Short Message Service) has become a widely used communication tool. However, this has led to a surge in SMS spam—irrelevant and often fraudulent messages. The limited computational resources of mobile devices make it challenging to detect and filter such spam effectively.
Project Overview
This project proposes a hybrid spam detection system using Multinomial Naïve Bayes and TF-IDF feature extraction to classify SMS messages into:
Safe
Suspicious
Spam
A sender blocking mechanism is also integrated, which issues warnings and blocks senders after three offenses.
Problem Statement
SMS spam poses a threat by spreading misinformation and enabling fraudulent activities. Traditional spam filters often fail to address this adequately. This project addresses the need for a more accurate, automated, and proactive spam detection and prevention system.
Related Work
A literature survey highlights recent advancements:
Sharma et al. (2021): SVM with word embeddings
Kumar & Reddy (2022): Hybrid CNN-LSTM model
Patel et al. (2023): Naïve Bayes + TF-IDF approach
Ali & Singh (2024): Classification + sender blocking mechanism
Existing Spam Detection Methods
Standard Filtering: Rule-based filters using headers, blacklists, and user-defined rules.
Enterprise Filtering: Server-based filters with message scoring and list-based auto-blocking.
Case-Based Filtering: Machine learning-based classification with training and testing phases.
Proposed System
Main Components:
Data Preprocessing: Cleaning, stopword removal, normalization
Feature Extraction: Using TF-IDF
Classification: Using Multinomial Naïve Bayes
Sender Blocking: After three spam/suspicious messages
Visualization: Charts for performance analysis
Architecture Diagrams Included:
Flowchart of SMS processing
System architecture
UML Use Case diagram (user and admin roles)
Methodology
Dataset: 5,000 SMS messages (from Kaggle)
Phases:
Data cleaning
Feature extraction
Model training/testing (80:20 split)
Classification
Sender blocking
Result visualization
Testing & Results
Accuracy: 97.5%
Precision: 96.8%
Recall: 97.1%
F1-Score: 96.9%
Evaluation Tools: Confusion matrix, pie charts, bar graphs
Outcome: High reliability with minimal false positives/negatives
The system not only detects spam but also actively prevents future spam by blocking repeat senders, thereby improving user safety and trust.
Conclusion
Conclusion
This paper presents a machine learning-based SMS spam detection and sender blocking system. The integration of TF-IDF with Naïve Bayes ensures efficient classification, while the sender blocking mechanism enhances user safety. In the future, deep learning models such as transformers can be incorporated for improved accuracy. Additionally, deployment as a mobile app with real-time filtering will make the system more practical for end-users.
In this project, a modified algorithm aimed at identifying SMS spam has been proposed. The primary objective of this endeavor is to mitigate the challenges faced by individuals and mobile users due to fraudulent messages. Extensive literature review was conducted to identify an appropriate model for addressing the problem statement. The anticipated experimental results are expected to demonstrate superior accuracy compared to traditional algorithms.
References
[1] Sharma, A., et al. \'SMS Spam Detection Using SVM and Word Embeddings,\' IEEE Access, 2021.
[2] Kumar, R., & Reddy, S. \'Hybrid CNN-LSTM for Spam SMS Classification,\' Journal of AI Research, 2022.
[3] Patel, V., et al. \'Naïve Bayes and TF-IDF for SMS Spam Detection,\' Springer, 2023.
[4] Ali, M., & Singh, P. \'Integrated Spam Detection and Sender Blocking System,\' Elsevier, 2024.